AAAI.2017 - AI and the Web | Cool Papers - Immersive Paper Discovery

#1 Multiple Source Detection without Knowing the Underlying Propagation Model [PDF] [Copy] [Kimi]

Authors: Zheng Wang ; Chaokun Wang ; Jisheng Pei ; Xiaojun Ye

Information source detection, which is the reverse problem of information diffusion, has attracted considerable research effort recently. Most existing approaches assume that the underlying propagation model is fixed and given as input, which may limit their application range. In this paper, we study the multiple source detection problem when the underlying propagation model is unknown. Our basic idea is source prominence, namely the nodes surrounded by larger proportions of infected nodes are more likely to be infection sources. As such, we propose a multiple source detection method called Label Propagation based Source Identification (LPSI). Our method lets infection status iteratively propagate in the network as labels, and finally uses local peaks of the label propagation result as source nodes. In addition, both the convergent and iterative versions of LPSI are given. Extensive experiments are conducted on several real-world datasets to demonstrate the effectiveness of the proposed method.

#2 Radon – Rapid Discovery of Topological Relations [PDF] [Copy] [Kimi]

Authors: Mohamed Sherif ; Kevin Dreßler ; Panayiotis Smeros ; Axel-Cyrille Ngonga Ngomo

Geospatial data is at the core of the Semantic Web, of which the largest knowledge base contains more than 30 billions facts. Reasoning on these large amounts of geospatial data requires efficient methods for the computation of links between the resources contained in these knowledge bases. In this paper, we present Radon – efficient solution for the discovery of topological relations between geospatial resources according to the DE9-IM standard. Our evaluation shows that we outperform the state of the art significantly and by several orders of magnitude.

#3 CLARE: A Joint Approach to Label Classification and Tag Recommendation [PDF] [Copy] [Kimi]

Authors: Yilin Wang ; Suhang Wang ; Jiliang Tang ; Guojun Qi ; Huan Liu ; Baoxin Li

Data classification and tag recommendation are both important and challenging tasks in social media. These two tasks are often considered independently and most efforts have been made to tackle them separately. However, labels in data classification and tags in tag recommendation are inherently related. For example, a Youtube video annotated with NCAA, stadium, pac12 is likely to be labeled as football, while a video/image with the class label of coast is likely to be tagged with beach, sea, water and sand. The existence of relations between labels and tags motivates us to jointly perform classification and tag recommendation for social media data in this paper. In particular, we provide a principled way to capture the relations between labels and tags, and propose a novel framework CLARE, which fuses data CLAssification and tag REcommendation into a coherent model. With experiments on three social media datasets, we demonstrate that the proposed framework CLARE achieves superior performance on both tasks compared to the state-of-the-art methods.

#4 Treatment Effect Estimation with Data-Driven Variable Decomposition [PDF] [Copy] [Kimi]

Authors: Kun Kuang ; Peng Cui ; Bo Li ; Meng Jiang ; Shiqiang Yang ; Fei Wang

One fundamental problem in causal inference is the treatment effect estimation in observational studies when variables are confounded. Control for confounding effect is generally handled by propensity score. But it treats all observed variables as confounders and ignores the adjustment variables, which have no influence on treatment but are predictive of the outcome. Recently, it has been demonstrated that the adjustment variables are effective in reducing the variance of the estimated treatment effect. However, how to automatically separate the confounders and adjustment variables in observational studies is still an open problem, especially in the scenarios of high dimensional variables, which are common in big data era. In this paper, we propose a Data-Driven Variable Decomposition (D$^2$VD) algorithm, which can 1) automatically separate confounders and adjustment variables with a data driven approach, and 2) simultaneously estimate treatment effect in observational studies with high dimensional variables. Under standard assumptions, we show experimentally that the proposed D$^2$VD algorithm can automatically separate the variables precisely, and estimate treatment effect more accurately and with tighter confidence intervals than the state-of-the-art methods on both synthetic data and real online advertising dataset.

#5 Phrase-Based Presentation Slides Generation for Academic Papers [PDF] [Copy] [Kimi]

Authors: Sida Wang ; Xiaojun Wan ; Shikang Du

Automatic generation of presentation slides for academic papers is a very challenging task. Previous methods for addressing this task are mainly based on document summarization techniques and they extract document sentences to form presentation slides, which are not well-structured and concise. In this study, we propose a phrase-based approach to generate well-structured and concise presentation slides for academic papers. Our approach first extracts phrases from the given paper, and then learns both the saliency of each phrase and the hierarchical relationship between a pair of phrases. Finally a greedy algorithm is used to select and align the salient phrases in order to form the well-structured presentation slides. Evaluation results on a real dataset verify the efficacy of our proposed approach.

#6 Finding Critical Users for Social Network Engagement: The Collapsed k-Core Problem [PDF] [Copy] [Kimi]

Authors: Fan Zhang ; Ying Zhang ; Lu Qin ; Wenjie Zhang ; Xuemin Lin

In social networks, the leave of critical users may significantly break network engagement, i.e., lead a large number of other users to drop out. A popular model to measure social network engagement is k-core, the maximal induced subgraph in which every vertex has at least k neighbors. To identify critical users for social network engagement, we propose the collapsed k-core problem: given a graph G, a positive integer k and a budget b, we aim to find b vertices in G such that the deletion of the b vertices leads to the smallest k-core. We prove the problem is NP-hard. Then, an efficient algorithm is proposed, which significantly reduces the number of candidate vertices to speed up the computation. Our comprehensive experiments on 9 real-life social networks demonstrate the effectiveness and efficiency of our proposed method.

#7 Correlated Cascades: Compete or Cooperate [PDF] [Copy] [Kimi]

Authors: Ali Zarezade ; Ali Khodadadi ; Mehrdad Farajtabar ; Hamid Rabiee ; Hongyuan Zha

In real world social networks, there are multiple cascades which are rarely independent. They usually compete or cooperate with each other. Motivated by the reinforcement theory in sociology we leverage the fact that adoption of a user to any behavior is modeled by the aggregation of behaviors of its neighbors. We use a multidimensional marked Hawkes process to model users product adoption and consequently spread of cascades in social networks. The resulting inference problem is proved to be convex and is solved in parallel by using the barrier method. The advantage of the proposed model is twofold; it models correlated cascades and also learns the latent diffusion network. Experimental results on synthetic and two real datasets gathered from Twitter, URL shortening and music streaming services, illustrate the superior performance of the proposed model over the alternatives.

#8 Understanding the Semantic Structures of Tables with a Hybrid Deep Neural Network Architecture [PDF] [Copy] [Kimi]

Authors: Kyosuke Nishida ; Kugatsu Sadamitsu ; Ryuichiro Higashinaka ; Yoshihiro Matsuo

We propose a new deep neural network architecture, TabNet, for table type classification. Table type is essential information for exploring the power of Web tables, and it is important to understand the semantic structures of tables in order to classify them correctly. A table is a matrix of texts, analogous to an image, which is a matrix of pixels, and each text consists of a sequence of tokens. Our hybrid architecture mirrors the structure of tables: its recurrent neural network (RNN) encodes a sequence of tokens for each cell to create a 3d table volume like image data, and its convolutional neural network (CNN) captures semantic features, e.g., the existence of rows describing properties, to classify tables. Experiments using Web tables with various structures and topics demonstrated that TabNet achieved considerable improvements over state-of-the-art methods specialized for table classification and other deep neural network architectures.

#9 Learning Visual Sentiment Distributions via Augmented Conditional Probability Neural Network [PDF] [Copy] [Kimi]

Authors: Jufeng Yang ; Ming Sun ; Xiaoxiao Sun

Visual sentiment analysis is raising more and more attention with the increasing tendency to express emotions through images. While most existing works assign a single dominant emotion to each image, we address the sentiment ambiguity by label distribution learning (LDL), which is motivated by the fact that image usually evokes multiple emotions. Two new algorithms are developed based on conditional probability neural network (CPNN). First, we proposed BCPNN which encodes image label into a binary representation to replace the signless integers used in CPNN, and employ it as a part of input for the neural network. Then, we train our ACPNN model by adding noises to ground truth label and augmenting affective distributions. Since current datasets are mostly annotated for single-label learning, we build two new datasets, one of which is relabeled on the popular Flickr dataset and the other is collected from Twitter. These datasets contain 20,745 images with multiple affective labels, which are over ten times larger than the existing ones. Experimental results show that the proposed methods outperform the state-of-the-art works on our large-scale datasets and other publicly available benchmarks.

#10 Semantic Proximity Search on Heterogeneous Graph by Proximity Embedding [PDF] [Copy] [Kimi]

Authors: Zemin Liu ; Vincent W. Zheng ; Zhou Zhao ; Fanwei Zhu ; Kevin Chen-Chuan Chang ; Minghui Wu ; Jing Ying

Many real-world networks have a rich collection of objects. The semantics of these objects allows us to capture different classes of proximities, thus enabling an important task of semantic proximity search. As the core of semantic proximity search, we have to measure the proximity on a heterogeneous graph, whose nodes are various types of objects. Most of the existing methods rely on engineering features about the graph structure between two nodes to measure their proximity. With recent development on graph embedding, we see a good chance to avoid feature engineering for semantic proximity search. There is very little work on using graph embedding for semantic proximity search. We also observe that graph embedding methods typically focus on embedding nodes, which is an "indirect'' approach to learn the proximity. Thus, we introduce a new concept of proximity embedding, which directly embeds the network structure between two possibly distant nodes. We also design our proximity embedding, so as to flexibly support both symmetric and asymmetric proximities. Based on the proximity embedding, we can easily estimate the proximity score between two nodes and enable search on the graph. We evaluate our proximity embedding method on three real-world public data sets, and show it outperforms the state-of-the-art baselines.

#11 Transitive Hashing Network for Heterogeneous Multimedia Retrieval [PDF] [Copy] [Kimi]

Authors: Zhangjie Cao ; Mingsheng Long ; Jianmin Wang ; Qiang Yang

Hashing is widely applied to large-scale multimedia retrieval due to the storage and retrieval efficiency. Cross-modal hashing enables efficient retrieval of one modality from database relevant to a query of another modality. Existing work on cross-modal hashing assumes that heterogeneous relationship across modalities is available for learning to hash. This paper relaxes this strict assumption by only requiring heterogeneous relationship in some auxiliary dataset different from the query or database domain. We design a novel hybrid deep architecture, transitive hashing network (THN), to jointly learn cross-modal correlation from the auxiliary dataset, and align the data distributions of the auxiliary dataset with that of the query or database domain, which generates compact transitive hash codes for efficient cross-modal retrieval. Comprehensive empirical evidence validates that the proposed THN approach yields state of the art retrieval performance on standard multimedia benchmarks, i.e. NUS-WIDE and ImageNet-YahooQA.

#12 Community Preserving Network Embedding [PDF] [Copy] [Kimi]

Authors: Xiao Wang ; Peng Cui ; Jing Wang ; Jian Pei ; Wenwu Zhu ; Shiqiang Yang

Network embedding, aiming to learn the low-dimensional representations of nodes in networks, is of paramount importance in many real applications. One basic requirement of network embedding is to preserve the structure and inherent properties of the networks. While previous network embedding methods primarily preserve the microscopic structure, such as the first- and second-order proximities of nodes, the mesoscopic community structure, which is one of the most prominent feature of networks, is largely ignored. In this paper, we propose a novel Modularized Nonnegative Matrix Factorization (M-NMF) model to incorporate the community structure into network embedding. We exploit the consensus relationship between the representations of nodes and community structure, and then jointly optimize NMF based representation learning model and modularity based community detection model in a unified framework, which enables the learned representations of nodes to preserve both of the microscopic and community structures. We also provide efficient updating rules to infer the parameters of our model, together with the correctness and convergence guarantees. Extensive experimental results on a variety of real-world networks show the superior performance of the proposed method over the state-of-the-arts.

#13 Joint Identification of Network Communities and Semantics via Integrative Modeling of Network Topologies and Node Contents [PDF] [Copy] [Kimi]

Authors: Dongxiao He ; Zhiyong Feng ; Di Jin ; Xiaobao Wang ; Weixiong Zhang

The objective of discovering network communities, an essential step in complex systems analysis, is two-fold: identification of functional modules and their semantics at the same time. However, most existing community-finding methods have focused on finding communities using network topologies, and the problem of extracting module semantics has not been well studied and node contents, which often contain semantic information of nodes and networks, have not been fully utilized. We considered the problem of identifying network communities and module semantics at the same time. We introduced a novel generative model with two closely correlated parts, one for communities and the other for semantics. We developed a co-learning strategy to jointly train the two parts of the model by combining a nested EM algorithm and belief propagation. By extracting the latent correlation between the two parts, our new method is not only robust for finding communities and semantics, but also able to provide more than one semantic explanation to a community. We evaluated the new method on artificial benchmarks and analyzed the semantic interpretability by a case study. We compared the new method with eight state-of-the-art methods on ten real-world networks, showing its superior performance over the existing methods.

#14 Web-Based Semantic Fragment Discovery for On-Line Lingual-Visual Similarity [PDF] [Copy] [Kimi]

Authors: Xiaoshuai Sun ; Jiewei Cao ; Chao Li ; Lei Zhu ; Heng Tao Shen

In this paper, we present an automatic approach for on-line discovery of visual-lingual semantic fragments from weakly labeled Internet images. Instead of learning region-entity correspondences from well-labeled image-sentence pairs, our approach directly collects and enhances the weakly labeled visual contents from the Web and constructs an adaptive visual representation which automatically links generic lingual phrases to their related visual contents. To ensure reliable and efficient semantic discovery, we adopt non-parametric density estimation to re-rank the related visual instances and proposed a fast self-similarity-based quality assessment method to identify the high-quality semantic fragments. The discovered semantic fragments provide an adaptive joint representation for texts and images, based on which lingual-visual similarity can be defined for further co-analysis of heterogeneous multimedia data. Experimental results on semantic fragment quality assessment, sentence-based image retrieval, automatic multimedia insertion and ordering demonstrated the effectiveness of the proposed framework.The experiments show that the proposed methods can make effective use of the Web knowledge, and are able to generate competitive results compared to state-of-the-art approaches in various tasks.

#15 Exploiting both Vertical and Horizontal Dimensions of Feature Hierarchy for Effective Recommendation [PDF] [Copy] [Kimi]

Authors: Zhu Sun ; Jie Yang ; Jie Zhang ; Alessandro Bozzon

Feature hierarchy (FH) has proven to be effective to improve recommendation accuracy. Prior work mainly focuses on the influence of vertically affiliated features (i.e. child-parent) on user-item interactions. The relationships of horizontally organized features (i.e. siblings and cousins) in the hierarchy, however, has only been little investigated. We show in real-world datasets that feature relationships in horizontal dimension can help explain and further model user-item interactions. To fully exploit FH, we propose a unified recommendation framework that seamlessly incorporates both vertical and horizontal dimensions for effective recommendation. Our model further considers two types of semantically rich feature relationships in horizontal dimension, i.e. complementary and alternative relationships. Extensive validation on four real-world datasets demonstrates the superiority of our approach against the state of the art. An additional benefit of our model is to provide better interpretations of the generated recommendations.

#16 A Declarative Approach to Data-Driven Fact Checking [PDF] [Copy] [Kimi]

Author: Julien Leblay

Fact checking is an essential part of any investigative work. For linguistic, psychological and social reasons, it is an inherently human task. Yet, modern media make it increasingly difficult for experts to keep up with the pace at which information is produced. Hence, we believe there is value in tools to assist them in this process. Much of the effort on Web data research has been focused on coping with incompleteness and uncertainty. Comparatively, dealing with context has received less attention, although it is crucial in judging the validity of a claim. For instance, what holds true in a US state, might not in its neighbors, e.g., due to obsolete or superseded laws. In this work, we address the problem of checking the validity of claims in multiple contexts. We define a language to represent and query facts across different dimensions. The approach is non-intrusive and allows relatively easy modeling, while capturing incompleteness and uncertainty. We describe the syntax and semantics of the language. We present algorithms to demonstrate its feasibility, and we illustrate its usefulness through examples.

#17 Multi-Task Deep Learning for User Intention Understanding in Speech Interaction Systems [PDF] [Copy] [Kimi]

Authors: Yishuang Ning ; Jia Jia ; Zhiyong Wu ; Runnan Li ; Yongsheng An ; Yanfeng Wang ; Helen Meng

Speech interaction systems have been gaining popularity in recent years. The main purpose of these systems is to generate more satisfactory responses according to users' speech utterances, in which the most critical problem is to analyze user intention. Researches show that user intention conveyed through speech is not only expressed by content, but also closely related with users' speaking manners (e.g. with or without acoustic emphasis). How to incorporate these heterogeneous attributes to infer user intention remains an open problem. In this paper, we define Intention Prominence (IP) as the semantic combination of focus by text and emphasis by speech, and propose a multi-task deep learning framework to predict IP. Specifically, we first use long short-term memory (LSTM) which is capable of modeling long short-term contextual dependencies to detect focus and emphasis, and incorporate the tasks for focus and emphasis detection with multi-task learning (MTL) to reinforce the performance of each other. We then employ Bayesian network (BN) to incorporate multimodal features (focus, emphasis, and location reflecting users' dialect conventions) to predict IP based on feature correlations. Experiments on a data set of 135,566 utterances collected from real-world Sogou Voice Assistant illustrate that our method can outperform the comparison methods over 6.9-24.5% in terms of F1-measure. Moreover, a real practice in the Sogou Voice Assistant indicates that our method can improve the performance on user intention understanding by 7%.

#18 Efficient Delivery Policy to Minimize User Traffic Consumption in Guaranteed Advertising [PDF] [Copy] [Kimi]

Authors: Jia Zhang ; Zheng Wang ; Qian Li ; Jialin Zhang ; Yanyan Lan ; Qiang Li ; Xiaoming Sun

In this work, we study the guaranteed delivery model which is widely used in online advertising. In the guaranteed delivery scenario, ad exposures (which are also called impressions in some works) to users are guaranteed by contracts signed in advance between advertisers and publishers. A crucial problem for the advertising platform is how to fully utilize the valuable user traffic to generate as much as possible revenue. Different from previous works which usually minimize the penalty of unsatisfied contracts and some other cost (e.g. representativeness), we propose the novel consumption minimization model, in which the primary objective is to minimize the user traffic consumed to satisfy all contracts. Under this model, we develop a near optimal method to deliver ads for users. The main advantage of our method lies in that it consumes nearly as least as possible user traffic to satisfy all contracts, therefore more contracts can be accepted to produce more revenue. It also enables the publishers to estimate how much user traffic is redundant or short so that they can sell or buy this part of traffic in bulk in the exchange market. Furthermore, it is robust with regard to priori knowledge of user type distribution. Finally, the simulation shows that our method outperforms the traditional state-of-the-art methods.

#19 Marrying Uncertainty and Time in Knowledge Graphs [PDF] [Copy] [Kimi]

Authors: Melisachew Chekol ; Giuseppe Pirrò ; Joerg Schoenfisch ; Heiner Stuckenschmidt

The management of uncertainty is crucial when harvesting structured content from unstructured and noisy sources. Knowledge Graphs ( KGs ) are a prominent example. KGs maintain both numerical and non-numerical facts, with the support of an underlying schema. These facts are usually accompanied by a confidence score that witnesses how likely is for them to hold. Despite their popularity, most of existing KGs focus on static data thus impeding the availabilityof timewise knowledge. What is missing is a comprehensive solution for the management of uncertain and temporal data in KGs . The goal of this paper is to fill this gap. We rely on two main ingredients. The first is a numerical extension of Markov Logic Networks (MLNs) that provide the necessary underpinning to formalize the syntax and semantics of uncertain temporal KGs . The second is a set of Datalog constraints with inequalities that extend the underlying schema of the KGs and help to detect inconsistencies. From a theoretical point of view, we discuss the complexity of two important classes of queries for uncertain temporal KGs: maximuma-posteriori and conditional probability inference. Due to the hardness of these problems and the fact that MLN solvers do not scale well, we also explore the usage of Probabilistic Soft Logics (PSL) as a practical tool to support our reasoning tasks. We report on an experimental evaluation comparing the MLN and PSL approaches.

#20 Read the Silence: Well-Timed Recommendation via Admixture Marked Point Processes [PDF] [Copy] [Kimi]

Authors: Hideaki Kim ; Tomoharu Iwata ; Yasuhiro Fujiwara ; Naonori Ueda

Everything has its time, which is also true in the point-of-interest (POI) recommendation task. A truly intelligent recommender system, even if you don't visit any sites or remain silent, should draw hints of your next destination from the ``silence", and revise its recommendations as needed. In this paper, we construct a well-timed POI recommender system that updates its recommendations in accordance with the silence, the temporal period in which no visits are made. To achieve this, we propose a novel probabilistic model to predict the joint probabilities of the user visiting POIs and their time-points, by using the admixture or mixed-membership structure to extend marked point processes. With the admixture structure, the proposed model obtains a low dimensional representation for each user, leading to robust recommendation against sparse observations. We also develop an efficient and easy-to-implement estimation algorithm for the proposed model based on collapsed Gibbs and slice sampling. We apply the proposed model to synthetic and real-world check-in data, and show that it performs well in the well-timed recommendation task.

#21 TweetFit: Fusing Multiple Social Media and Sensor Data for Wellness Profile Learning [PDF] [Copy] [Kimi]

Authors: Aleksandr Farseev ; Tat-Seng Chua

Wellness is a widely popular concept that is commonly applied to fitness and self-help products or services. Inference of personal wellness-related attributes, such as body mass index or diseases tendency, as well as understanding of global dependencies between wellness attributes and users' behavior is of crucial importance to various applications in personal and public wellness domains. Meanwhile, the emergence of social media platforms and wearable sensors makes it feasible to perform wellness profiling for users from multiple perspectives. However, research efforts on wellness profiling and integration of social media and sensor data are relatively sparse, and this study represents one of the first attempts in this direction. Specifically, to infer personal wellness attributes, we proposed multi-source individual user profile learning framework named "TweetFit". "TweetFit" can handle data incompleteness and perform wellness attributes inference from sensor and social media data simultaneously. Our experimental results show that the integration of the data from sensors and multiple social media sources can substantially boost the wellness profiling performance.

#22 Random-Radius Ball Method for Estimating Closeness Centrality [PDF] [Copy] [Kimi]

Authors: Wataru Inariba ; Takuya Akiba ; Yuichi Yoshida

In the analysis of real-world complex networks, identifying important vertices is one of the most fundamental operations. A variety of centrality measures have been proposed and extensively studied in various research areas. Many of distance-based centrality measures embrace some issues in treating disconnected networks, which are resolved by the recently emerged harmonic centrality. This paper focuses on a family of centrality measures including the harmonic centrality and its variants, and addresses their computational difficulty on very large graphs by presenting a new estimation algorithm named the random-radius ball (RRB) method. The RRB method is easy to implement, and a theoretical analysis, which includes the time complexity and error bounds, is also provided. The effectiveness of the RRB method over existing algorithms is demonstrated through experiments on real-world networks.

#23 A Dependency-Based Neural Reordering Model for Statistical Machine Translation [PDF] [Copy] [Kimi]

Authors: Christian Hadiwinoto ; Hwee Tou Ng

In machine translation (MT) that involves translating between two languages with significant differences in word order, determining the correct word order of translated words is a major challenge. The dependency parse tree of a source sentence can help to determine the correct word order of the translated words. In this paper, we present a novel reordering approach utilizing a neural network and dependency-based embeddings to predict whether the translations of two source words linked by a dependency relation should remain in the same order or should be swapped in the translated sentence. Experiments on Chinese-to-English translation show that our approach yields a statistically significant improvement of 0.57 BLEU point on benchmark NIST test sets, compared to our prior state-of-the-art statistical MT system that uses sparse dependency-based reordering features.

#24 POI2Vec: Geographical Latent Representation for Predicting Future Visitors [PDF] [Copy] [Kimi]

Authors: Shanshan Feng ; Gao Cong ; Bo An ; Yeow Meng Chee

With the increasing popularity of location-aware social media applications, Point-of-Interest (POI) recommendation has recently been extensively studied. However, most of the existing studies explore from the users' perspective, namely recommending POIs for users. In contrast, we consider a new research problem of predicting users who will visit a given POI in a given future period. The challenge of the problem lies in the difficulty to effectively learn POI sequential transition and user preference, and integrate them for prediction. In this work, we propose a new latent representation model POI2Vec that is able to incorporate the geographical influence, which has been shown to be very important in modeling user mobility behavior. Note that existing representation models fail to incorporate the geographical influence. We further propose a method to jointly model the user preference and POI sequential transition influence for predicting potential visitors for a given POI. We conduct experiments on 2 real-world datasets to demonstrate the superiority of our proposed approach over the state-of-the-art algorithms for both next POI prediction and future user prediction.

#25 Visual Sentiment Analysis by Attending on Local Image Regions [PDF] [Copy] [Kimi]

Authors: Quanzeng You ; Hailin Jin ; Jiebo Luo

Visual sentiment analysis, which studies the emotional response of humans on visual stimuli such as images and videos, has been an interesting and challenging problem. It tries to understand the high-level content of visual data. The success of current models can be attributed to the development of robust algorithms from computer vision. Most of the existing models try to solve the problem by proposing either robust features or more complex models. In particular, visual features from the whole image or video are the main proposed inputs. Little attention has been paid to local areas, which we believe is pretty relevant to human's emotional response to the whole image. In this work, we study the impact of local image regions on visual sentiment analysis. Our proposed model utilizes the recent studied attention mechanism to jointly discover the relevant local regions and build a sentiment classifier on top of these local regions. The experimental results suggest that 1) our model is capable of automatically discovering sentimental local regions of given images and 2) it outperforms existing state-of-the-art algorithms to visual sentiment analysis.